Control Mechanism for Adaptive Play-Out with State Recovery

ABSTRACT

A control logic means preferably for a receiver comprising a jitter buffer means adapted to receive and buffer incoming frames or packets and to extract data frames from the received packets, a decoder connected to the jitter buffer means adapted to decode the extracted data frames, and a time scaling means connected to the decoder adapted to play out decoded speech frames adaptively. The control logic means comprises knowledge of whether a state recovery function is available and is adapted to retrieve at least one parameter from at least one of the jitter buffer means, the time scaling means, and the decoder, to adaptively control at least one of an initial buffering time of said jitter buffer means, the knowledge of the availability of the state recovery function, and a time scaling amount of said time scaling means from the time scaling means or the decoder.

FIELD OF THE INVENTION

The present invention relates generally to packet-based communicationsystem suitable for transmission of sound signals, and more particularlyto buffering techniques for use in such communication systems.

BACKGROUND

Voice over IP is a convergence between the telecom and datacom world,wherein the speech signals are carried by the data packets, e.g.Internet Protocol (IP) packets. The recorded speech is encoded by aspeech codec on a frame-by-frame basis. A data frame is generated foreach speech frame. One or several data frames are packed into RTPpackets. The RTP packets are further packed into UDP packets and the UDPpackets are packed into IP packets. The IP packets are then transmittedfrom the sending client to the receiving client using an IP network.

A problem associated with packet based networks is delay jitter. Delayjitter implies that even though packets are transmitted with a regularinterval, for example one frame every 20 ms, the packets arriveirregularly to the receiver. Packets may even arrive out of order. Themost common reasons for receiving packets out-of-order is because thepackets travel different routes, at least for fixed networks. Forwireless networks, another reason may be that re-transmission is used.For example: When sending packet N on the uplink (i.e. from the mobileterminal to the base station) there may be bit errors that cannot becorrected and re-transmission has to be performed. However, thesignalling for retransmissions may be so slow that the next packet inthe queue (packet N+1) is sent before packet N is re-transmitted. Thismay result in that the packets are received out-of-order if packet N+1was correctly received before the re-transmitted packet N is correctlyreceived.

In VoIP clients, a jitter buffer means is used to equalize delay jitterin the transmission so that the speech samples can be played out at aconstant sampling rate, for example one frame every 20 ms. (Play out isin this description used to indicate the transmission of the speech tothe sound card.) The fullness level of the jitter buffer means isproportional to the amount of the delay jitter in the packet flow andthe objective is to keep the amount of late losses at an acceptablelevel while keeping the delay as low as possible. The following exampleexplains the importance of keeping the delay as low as possible: Longbuffering time in the jitter buffer means increases the end-to-enddelay. This reduces the perceived conversational quality because thesystem will be perceived as “slow”. Long delays increases the risk ofthat the users talk at the same time and may also give the impressionthat the other user is “slow” (thinking slowly). Further, a late loss isa packet that is properly received but that has arrived too late to beuseful for the decoder.

The jitter buffer means stores packets or frames for a certain time. Atypical way of defining this is to say that the jitter buffer means isfilled up to a certain “level”, denoted the fullness level. This levelis often measured in milliseconds instead of the number of frames sincethe size of the frames may vary. Thus the jitter buffer means level ismeasured in time. The jitter buffer means level can be set in a numberof different ways.

Fixed size: The fixed size implies that the jitter buffer fullness levelis fixed and pre-configured. After a DTX period, the jitter buffer meansis initially filled up with a fixed time e.g. a fixed number of frames(e.g. 5 frames) before speech play-out is resumed. This initial marginis used to give a protection against delay jitter and late loss.

Adaptive jitter buffer means size: The jitter buffer fullness levelvaries with the delay jitter. Similarly to the case of fixed size of thejitter buffer fullness level, an initial number of frames are bufferedup before speech play-out is resumed after a DTX period. However, duringthe active voice (non-DTX) period the fullness level of the jitterbuffer means may vary, based on analysis of the incoming packets. It ispossible to collect the statistics over several talk spurts. However,one usually reset the jitter buffer fullness level to the “defaultlevel” at every speech onset.

Adaptive jitter buffer means size with improved interactivity: In orderto reduce the perceived delay, it is possible to initialize the jitterbuffer means with a shorter time than for case with adaptive jitterbuffer means size and the speech play-out is started as soon as thefirst speech packet is received after DTX. In order to reach the jitterbuffer fullness level, time scaling is used to stretch the initialdecoded frames so that the packets are extracted from the jitter buffermeans at a reduced pace. Time scaling implies that the speech frames areplayed out adaptively, i.e., that a speech frame that normally contains20 msec of speech may be stretched and 30 msec of speech is generated.An alternative to start play-out after the first received packet is towait one or two extra packets. WO-200118790 A1 and US2004/0156397 A1describe time scaling.

DTX is discontinuous transmission and implies that a special type ofinformation is transmitted on the channel when no voice is present andthe input signal contains only (background) noise. The encoder evaluatesthe background noise and determines a set of parameters that describesthe noise (=Silence Description, SID, parameters). The SID parametersare transmitted to the receiving terminal so that a similar noise,comfort noise, can be generated. The SID parameters are transmitted lessfrequently than normal speech frames in order to save power andtransmission resources.

Turning now to FIG. 1 showing an example of initial jitter buffer meansoperation according to the method of the adaptive jitter buffer meanssize with improved interactivity. The upper plot shows the jitter bufferfullness level and the lower plot shows frame size. The play-out isstarted as soon as the first packet is received, at about 0.5 seconds.Time scaling is performed to increase the size of the generated framesand thereby consume frames at a slower than normal pace from the jitterbuffer means. The early start of the play-out gives a feeling ofimproved interactivity which increases the perceived conversationalquality. In the end of the talk-burst, at about 3 seconds, the lastspeech frames are shortened and played out at a faster pace thannormally. This gives a further improved interactivity.

Note that the adaptation of the target jitter buffer means level (60 ms)during the non-DTX period is not shown in FIG. 1, however thisfunctionality will exist in a typical implementation of the adaptivejitter buffer means size with improved interactivity.

There are however several drawbacks with the three methods describedabove. The fixed jitter buffer means size, gives a quite long delaysince a number of packets are always buffered before the play-outstarts. This reduces the perceived interactivity.

The adaptive jitter buffer means may adjust the fullness level in orderto introduce less delay on average, at least if the channel is varyingslowly. The problem with poor interactivity due to long initialbuffering time still remains since the purpose with the adaptation is toadapt within an ongoing packet flow during active speech when the flowstarts up after a DTX period. It should be noted that this problemoccurs if the jitter buffer fullness level is reset to a default levelat every speech onset (i.e. at the switching from DTX to speech).

The jitter buffer means initialization, when using the adaptive jitterbuffer means size with improved interactivity, improves theinteractivity as the perceived initial delay will be lower. One problemis however that the jitter buffer means level is very low in thebeginning of a speech burst and there is therefore a risk that delayjitter in the beginning of speech bursts results in late losses.Similarly to frame losses, late losses will reduce the speech qualitysince the error concealment is activated for the frame that is lost oris received late.

Additionally, the method of the adaptive jitter buffer means size withimproved interactivity also implies that the time scaling, to adjust thebuffer level up to the normal fullness level, must be done quite fastsince the adaptation period must be short enough to avoid being hit bymultiple delay spikes. A delay spike is when the delay increasessubstantially from a first packet to a subsequent packet. This meansthat the time scaling must be quite aggressive. Aggressive time scalingincreases the risk that the time scaling itself introduces distortions.The distortions may be of different kind, clicks, plops, bursts ofnoise, but also “funny sounding sound” like “unnatural talking amount”.

For most modern speech codecs (GSM-EFR, GSM-AMR, ITU-T G.729, EVRC,etc), that use inter-frame prediction to be able to encode the signal ata lower bit rate but with maintained quality, there is an additionalproblem. Both frame losses and late losses give distortions for thecurrent frame and also for subsequent frames since the error propagatefor some time due to the inter-frame prediction. The error propagationtime depends on the sound and the codec but may be as long as 5-6 frames(100-120 ms). Late losses are especially critical in the beginning ofspeech burst as these parts often contain voiced onsets, which are laterused by the adaptive codebook to build up the voiced waveform. Theresult of a late loss in the beginning of a speech burst is thereforeoften very audible and can degrade intelligibility considerably.

There are a few methods to compensate for the error propagation thatwould occur if a late loss occurs during the build-up time, but they allhave significant drawbacks. One possibility is to reduce initialbuffering time but not as much as could be done in the optimum case.This would, of course, mean that it is not possible to benefit thatmuch, in terms of interactivity, as it would be desired to.

Another possibility is to reduce of the amount of inter-frame predictionused in the codec. This would however either result in a reducedintrinsic speech quality, since the inter-frame correlation is notexploited to its full potential, or require that the signal is encodedat a higher bit rate, or both.

Due to the drawbacks with the method of adaptive jitter buffer meanssize with improved interactivity, the method is difficult to use in realsystems. For channels that contains very little jitter and preferablyalso few packet losses it may work well but for channels that contains alot of jitter and possibly also gives packet losses it is very difficultto get the full gain in improved interactivity. For most practicalcases, it would be preferable to have an initialization time of a fewframes before the play-out starts.

SUMMARY

An object of the present invention is to achieve control logic meansthat improves the interactivity and/or the speech (listening) quality.

The above stated object is achieved by a control logic means and amethod according to the independent claim.

Preferred embodiments are defined by the dependent claims.

This invention is based on the possibility to adaptively control atleast one of the initial buffering time and time scaling amount toimprove the interactivity and/or the speech and listening quality.

That is achieved by introduction of a control logic means that isadapted to retrieve information from at least one of the jitter buffermeans, decoder, and the time scaling means and the state recovery means,the control logic means is further adapted to adaptively control the atleast one of the initial buffering time and time scaling amount basedthe retrieved information.

Thanks to the introduction of the control logic means, it is possible toimprove the advantage of the state recovery means in combination withthe initial buffering time. State recovery makes the receiver lesssensitive to late losses during the initial jitter buffer means build-upperiod. It is therefore possible to have very short initialization time,by having an aggressive time scaling. This improves the interactivityeven further than what is possible with the method of adaptive jitterbuffer means size with improved interactivity.

Since the robustness against late losses is increased by means of thestate recovery, a longer jitter buffer means build up period can also beallowed. It is therefore possible to do less aggressive time scaling.This may be advantageous, since time scaling may introduce distortionsin the synthesized speech due to the performance of the time scaling isdifferent for different sounds.

Since the control logic means is able to combine the initial bufferingtime, state recovery and time scaling in different ways, an adaptationbetween these variants improves the performance. This adaptation may bebased on either the current channel conditions or the sound signal orboth.

The use of time scaling and state recovery results in an increasedcomplexity, and hence in a higher Central Processing Unit (CPU) load. Afurther advantage with the present invention is that it enables controlof the complexity by controlling parameter settings andenabling/disabling of the state recovery. That is achieved by thecontrol logic means that retrieves information regarding the CPU loadand controls the parameter settings or enabling/disabling of the staterecovery means. The retrieved information regarding to the CPU load maybe associated with time scaling operations and/or state recoveryoperations.

The listening quality is improved as the improved interactivity gives alower perceived delay and state recovery improves the quality as thepossible late losses due to the aggressive use of time scaling arerepaired.

Another advantage is that the control logic means allows for adapting todifferent operating conditions, such as good channel conditions in anoffice LAN having only occasional delay spikes and short delays or badchannel conditions in a heavily loaded cellular network having largejitter and possibly also packet losses and long delay.

BRIEF DESCRIPTION OF THE DRAWINGS

The objects and advantages of this invention will be apparent fromreading this description in conjunction with the drawings, in which:

FIG. 1 is a graph showing the operation of the time scalingfunctionality.

FIG. 2 shows the control logic means in a receiver according to thepresent invention.

FIG. 3 is graph illustrating the improved performance with staterecovery.

FIG. 4-6 shows the functionality of the improved jitter buffer meansbuild-up according to the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The present invention will be described more fully hereinafter withreference to the accompanying drawings, in which preferred embodimentsof the invention are shown. This invention may, however, be embodied inmany different forms and should not be construed as limited to theembodiments set forth herein; rather, these embodiments are provided sothat this disclosure will be thorough and complete, and will fullyconvey the scope of the invention to those skilled in the art.

An overview of a receiver for a Voice over IP client is shown in FIG. 2.It should be noted that the present invention may also be applicable toVoice over ATM without IP and other systems where delay jitter occurs.The receiver comprises a jitter buffer means connected to a decoder andthe decoder is further connected to a time scaling means. A staterecovery means in connection with an Error Concealment (ECU) means maybe connected to the jitter buffer means and the decoder. The receiverreceives data packets via the jitter buffer means. The packets areunpacked, as the packets may contain data for several speech dataframes, if so, this is indicated in the packet payload header. Thus thejitter buffer means is adapted to extract the data frames from thepackets. As the data frames may be re-ordered due to delay jitter, thejitter buffer means places the frames in order. From the jitter buffermeans it is possible to get information about late losses, frame lossesand the current jitter buffer means level. It should be noted that theextraction of data frames may also be performed in another means and thejitter buffer means receives then the data frames.

The decoder decodes the data frames into speech frames i.e. into a soundsignal. For instance as in the AMR 12.2 kbps mode where the data framesare 244 bits, they are decoded into 160 16-bit words samples (speechframe).

The time scaling means has the ability to compress or expand the size ofthe decoded speech frames coming from the decoders, for instance the 160samples from the speech decoder can be expanded to 240 samples orcompressed to 80 samples or expanded/compressed to some other framesize. The time scaling means provides information regarding the achievedcompression or expansion. Time scaling performs differently fordifferent sound signals. Some sound signals are quite easy to scale intime and time scaling then introduces no or little distortions. Examplesof such sounds are stationary voiced segments, unvoiced segments andbackground noise. Other sound signals are quite hard to scale in timeand then time scaling would probably introduce quite audibledistortions. Examples of sounds that are difficult to scale aretransient sounds (that goes from unvoiced to voiced), plosives (“t”,“p”, “b”, . . . ), speech onsets (that goes from background noise (DTX)to speech). Therefore, it is desired to retrieve information from eitherthe speech decoder or the time scaling function to be able to decide howaggressive the time scaling should be. Other examples of differentparameters or measures that describe sound properties or channelcharacteristics and that are desired to retrieve, alone or incombination, to adapt the initial buffering time or the time scalingamount are adaptive codebook (ACB) gain, fixed codebook (FCB) gain,LTP/ACB lag measures and characteristics, LSP coefficientscharacteristics, spectral flatness measure, spectrum variations, energymeasures and variations.

Examples of different methods for performing time scaling that may beused in the present invention are described in the patent application WO01/93516 and in the U.S. Pat. No. 6,873,954. It should be noted that theabove mentioned time scaling means may also be located in thetransmitter, and the time scaling may thus be performed before theencoding operation. If the time scaling is performed in the transmitter,some information must be exchanged between the encoder and decoder.

The decoding means and the time scaling means may also be integratedinto one unit. In such an integrated unit, the time scaling is performedon the excitation before the synthesis filter and then apply thesynthesis filter on more or less samples than in the normal case.

The time scaling means is further connected to a sample buffer. The timescaled frames are transferred to the sample buffer. One or a pluralityof samples, wherein a frame is a plurality of consecutive samples, aresent from the sample buffer to the sound card of the speaker as long asthe sample buffer is filled to a pre-defined threshold level. If thesample buffer is not filled, further decoding operations are requested.Therefore, the introduction of time scaling is possible.

According to the present invention, a control logic means is introducedto retrieve information such as knowledge about existing state recoveryfunction, channel characteristics, sound properties, caused distortion(comparison of distortion before and after time scaling) and theachieved time scaling. Information about existing state recoveryfunction can pre-configured in the control logic means or informationabout enabling/disabling of a state recovery means may be fetched fromthe state recovery means. Channel characteristics can be fetched fromthe jitter buffer means, sound properties and distortion information canbe fetched from the decoder, distortion information and achieved timescaling can be fetched form the time scaling means. Thus the controllogic means is required to have knowledge whether a state recoveryfunction is available and the control logic means is adapted to retrieveinformation from at least one of the jitter buffer means, the staterecovery means, the decoder and the time scaling means. The controllogic means may also be used to control the jitter buffer fullnesslevel.

The control logic means is then adapted to adaptively control at leastone of the initial buffering time of the jitter buffer means based onthe retrieved information from the jitter buffer means and the knowledgeof the availability of the state recovery function and the time scalingsettings of the time scaling means based on the retrieved informationfrom the time scaling means or the decoder in combination with theknowledge of the availability of the state recovery function. Thecontrol logic means is preferably adapted to perform this controllingper frame basis.

The state recovery means provides a state recovery function. The staterecovery function repairs late losses and improves pure errorconcealment. The function is described in the U.S. Pat. No. 6,721,327B1.

When a frame is not received, either because it is lost or is notreceived in time (i.e. received too late), the error concealment meanswill be activated to try to conceal the error. However, by using theerror concealment erroneous start-up states for the subsequent frame areprovided. A frame that is received but not in time to be useful for thesynthesis is still usable to correct the states at the frame boundaries,before the subsequent frame is decoded and synthesized. This is doneaccording to the state recovery method by performing an additionaldecoding using decoder states that are reverted back to the state beforethe late loss. The decoding is done using the correctly receivedparameters resulting in a corrected decoder state. The audio samplesfrom the additional decoding are discarded as they are too late to beplayed out. The states after the error concealed frame are eitherreplaced by or combined with the states from the additional decoding tocreate improved states that are more suitable for the subsequent frame.This results in that the error propagation time is reduced.

State recovery improves the performance even if a multiple consecutivelate losses occurs, however due to decoding complexity constraints itwill be preferable to use state recovery in an implementation to onlyhandle single or very few late losses which does not result in a decodercomplexity overload (which in turn results in a CPU overload). FIG. 3shows the advantage that the state recovery gives. The upper graph ofFIG. 3 disclose an undistorted wave form, the middle wave form isdistorted by late losses and the lower waveform is distorted by latelosses but repaired by state recovery. It should be noted that thewaveforms and timing differs slightly as the time scaling is involved.It can then be seen that the speech in the middle graph is attenuatedand distorted over a longer period of time which results in a bad speechquality. Thus, the state recovery improves the performance by making thesystem more robust against late losses but increases the decodingcomplexity by the required additional decoding.

The method and arrangements of the present invention improves theperceived speech quality during the jitter buffer means build-up phase.The improved jitter buffer means build-up phase is described in FIGS. 4to 6.

The graph in FIGS. 4-6 shows the jitter buffer means level on thevertical axis and the time on the horizontal axis. The initial bufferingtime, the build-up time and the fullness level are indicated. Theinitial buffering time is the time (or size of the received frames inthe buffer) before the frames are transferred further to the decoder andthe build-up time is the required time to reach the jitter bufferfullness level. In FIG. 4, the dashed line shows the jitter bufferfullness level for the method of buffer with improved interactivity. Thesolid line shows the jitter buffer fullness level with the methodaccording to the present invention, wherein the control logic meanscontrols the initial buffering time and time scaling amount that affectsthe build up time. This control is based on the late loss probabilityduring the build up time and the existence of state recovery.

State recovery enables further reduction of the initial buffering timesince the state recovery makes the receiver more robust to late losses.The control logic means according to the present invention enablesadaptation of the reduced initial buffering time based on theexistence/non-existence of state recovery. Since it is possible toreduce the initial buffering time even further the perceivedinteractivity is further improved than what is performed in prior art.

How well the time scaling works, depends on the sound properties of thedecoded speech frames as described above. For some sounds the timescaling introduces distortion and for some sounds, the time scalingworks very well. An analysis of the sound property may be used by thecontrol logic means according to the present invention to decide howaggressive time scaling should be, i.e. to adapt the time scaling to thecurrent situation. A very aggressive time scaling makes it possible tohave a very short jitter buffer means build-up period, which reduces therisk of being hit by the delay spike. An aggressive time scaling isillustrated in FIG. 5. The short build up time is beneficial since thestate recovery is an error concealment method and gives better, but notperfect, states, which implies that late losses still may impact theperformance. If the channel has severe delay jitter characteristics andif state recovery is not available, then it is required to use a veryaggressive time scaling amount to increase the content in the jitterbuffer means very rapidly. If state recovery is available, then theaggressiveness can be controlled depending on how good the time scalingperforms for the current speech segment. For the sounds where the timescaling does not work that well, the control logic means will trigger aless aggressive time scaling, which gives a longer build-up time.Examples of different parameters or measures that describe how good timescaling performs and that the control logic means may use alone or incombination with other parameters/measures to control the time scalingaggressiveness are spectrum error, energy differences and pitch matchingerror between the signal before and after time scaling operation. Thisis illustrated in FIG. 6. In this case, the control logic means canpreferably enable the state recovery function to reduce the impact oflate losses.

Since both the channel properties and the speech signal vary over time,it is beneficial to have a control logic means that adapts between theabove mentioned jitter buffer means build-up strategies illustrated inFIGS. 4-6. Further, if the channel is varying rapidly, then it isbeneficial to have a short build-up period since this reduces the riskof being hit by one or several delay spikes. This means that statisticsof the channel behaviour must be collected e.g. from the jitter buffermeans, so that the statistics can be used by the control logic means inorder to adapt the time scaling amount, thereof.

State recovery introduces extra decoding complexity, which results in ahigher CPU load, since additional decoding operations are performed. Theextra decoding operations are required since the decoder state isreverted to the assumed state before the late loss and decoding is doneusing the correctly received but delayed parameters. The number of extradecoding operations is proportional to how late the late frame is. Ifthe frame is late by one frame, one extra state decoding is needed. Inorder to reduce the complexity it is not necessary to run the synthesisfilter and post filter. The synthesis filter and post-filter states aretherefore not recovered. This is possible as the objective of the staterecovery is only to recover states that otherwise take a long time torepair without state recovery. This covers the parts that are involvedin the update of the adaptive codebook (pitch gain, pitch lag, fixedcodebook gain, fixed codebook). This means that the complexity increasedis roughly halved.

An extra ECU decoding is needed to avoid the discontinuity between theprevious error concealed frame and the newly decoded good frame that isdecoded using the recovered decoder state. An overlap period of about 5to 20 ms is needed to give a smooth transition between the two decodedsignals (overlap-and-add). Thus the state recovery increases thedecoding complexity and hence the CPU load. Therefore, there may becases where the total complexity may reach beyond what the CPU canhandle. It is therefore necessary to control the decoding complexity andthe CPU load accordingly. The control logic means according to oneembodiment of the present invention is adapted to retrieve informationregarding the CPU load in order to know when the state recovery meansshould be enabled/disabled due to the CPU load.

Furthermore, the use of time scaling also introduces increasedcomplexity and hence increased CPU load. The control logic means maymonitor the total complexity used by the time scaling means, and adjustthe complexity used by the state recovery means. E.g., if it is foundthat the time scaling means is using a lot of resources the staterecovery can be limited to a fewer number of parameters, or performedwith a lower resolution. Alternatively one can reduce the overlap lengthin the synthesis mixing operation. The control logic means may evenadjust the speech parameters used in normal decoding to simplify thesynthesis step. (e.g. force the use of integer pitch lags, or eventotally shut down the ACB-excitation extraction).

With a tight control of the complexity usage for the different receiverparts, the receiver parts may use its cycles where they are most neededto provide the highest interactivity possible within a given minimumspeech quality and a given maximum complexity allowance. This control isuseful for strictly cycle limited embedded systems, for example within acellular platform. It should be noted that the complexity limitationsmay be equally limited in the system, e.g. in the Media Gateway (MGW).Thus the retrieved CPU load related information may also concern the MGWCPU load, or another system CPU load.

Thus, the present invention relates to a control logic means connectableto jitter buffer means adapted to receive and buffer incoming frames orpackets and to extract data frames from the received packets, todecoding means connected to the jitter buffer means adapted to decodethe extracted data frames, and to time scaling means adapted to play outdecoded speech frames adaptively. The control logic means comprisesfurther knowledge of whether a state recovery function is available andthat the control logic means is adapted to retrieve at least oneparameter from at least one of the jitter buffer means, the time scalingmeans, and the decoding means, to adaptively control at least one of aninitial buffering time of said jitter buffer means based on the at leastone parameter from the jitter buffer means and the knowledge of theavailability of the state recovery function, and a time scaling amountof said time scaling means based on the at least one retrieved parameterfrom the time scaling means or the decoder and the knowledge of theavailability of the state recovery function. The control logic means ispreferably implemented in a receiver of a VoIP client.

The present invention also relates to a method. The method comprises thesteps of:

1. Obtain knowledge of whether a state recovery function is available.

2. Retrieve at least one parameter from at least one of the jitterbuffer means, the time scaling means, and the decoder to adaptivelycontrol at least one of an initial buffering time of said jitter buffermeans based on the at least one parameter from the jitter buffer meansand the knowledge of the availability of the state recovery function,and a time scaling amount of said time scaling means based on the atleast one retrieved parameter from the time scaling means or the decoderand the knowledge of the availability of the state recovery function.

The method may be implemented by a computer program product. Such acomputer program product may be directly loadable into a processingmeans in a computer, comprising the software code means for performingthe steps of the method.

The computer program product may be stored on a computer usable medium,comprising readable program for causing a processing means in acomputer, to control the execution of the steps of the method.

In the drawings and specification, there have been disclosed typicalpreferred embodiments of the invention and, although specific terms areemployed, they are used in a generic and descriptive sense only and notfor purposes of limitation, the scope of the invention being set forthin the following claims.

1. A control logic means connectable to jitter buffer means adapted toreceive and buffer incoming frames or packets and to extract data framesfrom the received packets, to decoding means connected to the jitterbuffer means adapted to decode the extracted data frames, to timescaling means adapted to play out decoded speech frames adaptively, thecontrol logic means comprises knowledge of whether a state recoveryfunction is available and that the control logic means is adapted toretrieve at least one parameter from at least one of the jitter buffermeans, the time scaling means, and the decoding means, to adaptivelycontrol at least one of an initial buffering time of said jitter buffermeans based on the at least one parameter from the jitter buffer meansand the knowledge of the availability of the state recovery function,and a time scaling amount of said time scaling means based on the atleast one retrieved parameter from the time scaling means or the decoderand the knowledge of the availability of the state recovery function. 2.The control logic means according to claim 1, wherein the retrievedparameter from the jitter buffer means relates to channelcharacteristics.
 3. The control logic means according to claim 1,wherein the retrieved parameter from the decoding means relates to soundcharacteristics.
 4. The control logic means according to claim 1,wherein the retrieved parameter from the time scaling means relates toat least one of sound characteristics, distortion information andachieved time scaling.
 5. The control logic means according to claim 1,wherein it is adapted to retrieve a further parameter relating to theCPU load and adapted to further adaptively control at least one of aninitial buffering time of said jitter buffer means, and a time scalingamount of said time scaling means based on the retrieved parameter. 6.The control logic means according to claim 1, wherein it is adapted toretrieve information relating to the CPU load and adapted to adaptivelycontrol state recovery means based on said CPU load related information.7. The control logic means according to claim 6, wherein the retrievedinformation relating to the CPU load is associated with time scalingoperations.
 8. The control logic means according to claim 6, wherein theretrieved information relating to the CPU load is associated with timerecovery operations.
 9. The control logic means according to claim 6,characterized in that the state recovery means is adaptivelyenabled/disabled based on said CPU load related information.
 10. Thecontrol logic means according to claim 6, wherein the state recovery isadaptively limited to a fewer number of parameters or performed with alower resolution based on said CPU load related information.
 11. Thecontrol logic means according to claim 1, wherein it is adapted toadaptively control per frame basis at least one of an initial bufferingtime of said jitter buffer means, a time scaling amount of said timescaling means and the state recovery means.
 12. A method for controllinga jitter buffer means adapted to receive and buffer incoming frames orpackets and to extract data frames from the received packets, wherein adecoder is connected to the jitter buffer means adapted to decode theextracted data frames, and for controlling a time scaling meansconnected to the decoder adapted to play out decoded speech framesadaptively, the method comprises the steps of: obtaining knowledge ofwhether a state recovery function is available, retrieving at least oneparameter from at least one of the jitter buffer means, the time scalingmeans, and the decoder, controlling adaptively at least one of aninitial buffering time of said jitter buffer means based on the at leastone parameter from the jitter buffer means and the knowledge of theavailability of the state recovery function, and a time scaling amountof said time scaling means based on the at least one retrieved parameterfrom the time scaling means or the decoder and the knowledge of theavailability of the state recovery function.
 13. The method according toclaim 12, wherein the retrieved parameter from the jitter buffer meansrelates to channel characteristics.
 14. The method according to claim12, wherein the retrieved parameter from the decoder relates to soundcharacteristics.
 15. The method according to claim 12, wherein theretrieved parameter from the time scaling means related to at least oneof sound characteristics, distortion information and achieved timescaling.
 16. The method according to claim 12, comprising the furtherstep of: retrieving a further parameter relating to the CPU load andcontrolling adaptively at least one of an initial buffering time of saidjitter buffer means, and a time scaling amount of said time scalingmeans based on the retrieved parameter.
 17. The method according toclaim 12, comprising the further step of: retrieving informationrelating to the CPU load and controlling adaptively the state recoverymeans based on said CPU load related information.
 18. The methodaccording to claim 17, wherein the retrieved information relating to theCPU load is associated with time scaling operations.
 19. The methodaccording to claim 17, wherein the retrieved information relating to theCPU load is associated with time recovery operations.
 20. The methodaccording to claim 17, wherein the state recovery means is adaptivelyenabled/disabled based on said CPU load related information.
 21. Themethod according to claim 17, wherein the state recovery is adaptivelylimited to a fewer number of parameters, or performed with a lowerresolution based on said CPU load related information.
 22. The methodaccording to claim 12, comprising the step of: controlling adaptivelyper frame basis at least one of an initial buffering time of said jitterbuffer means, a time scaling amount of said time scaling means and thestate recovery means.
 23. A computer program product directly loadableinto the internal memory of a computer within a receiver of a packetbased communication system, comprising the software code portions forperforming the steps of claim
 12. 24. A computer program product storedon a computer usable medium, comprising readable program for causing acomputer, within a receiver of a packet based communication system, tocontrol an execution of the steps of claim 12.